-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove uses of pkg_resources in non-namespace packages. #7902
Remove uses of pkg_resources in non-namespace packages. #7902
Conversation
In protocolbuffers#713 and protocolbuffers#1296, the `google` package in protobuf sources was found to cause conflicts with other Google projects, because it was not properly configured as a namespace package [1]. The initial fix in 786f80f addressed part of the issue, and protocolbuffers#1298 fixed the rest. However, 786f80f (the initial fix) also made `google.protobuf` and `google.protobuf.pyext` into namespace packages. This was not correct: they are both regular, non-namespace, sub-subpackages. However (still), the follow-up protocolbuffers#1298 did not nominate them as namespace packages, so the namespace registration behavior has remained, but without benefit. This change removes the unnecessary namespace registration, which has substantial overhead, thus reducing startup time substantially when using protobufs. Because this change affects the import internals, quantifying the overhead requires a full tear-down/start-up of the Python interpreter. So, to capture the full cost for every run, I measured the time to launching a _fresh_ Python instance in a subprocess, varying the imports and code under test. In other words, I used `timeit` to measure the time to launch a _fresh_ Python subprocess which actually performs the imports. * Reference: normal Python startup (i.e., don't import protobuf at all). ``` % python3 -m timeit -s 'import subprocess' -r 3 -n 10 'subprocess.call(["python3", "-c", "pass"])' 10 loops, best of 3: 27.1 msec per loop ``` * Baseline: cost to import `google.protobuf.descriptor`, with extraneous namespace packages. ``` % python3 -m timeit -s 'import subprocess' -r 3 -n 10 'subprocess.call(["python3", "-c", "import google.protobuf.descriptor"])' 10 loops, best of 3: 133 msec per loop ``` * This change: cost to import `google.protobuf.descriptor`, without extraneous namespace packages. ``` % python3 -m timeit -s 'import subprocess' -r 3 -n 10 'subprocess.call(["python3", "-c", "import google.protobuf.descriptor"])' 10 loops, best of 3: 43.1 msec per loop ``` [1]: https://packaging.python.org/guides/packaging-namespace-packages/
I don't quite understand why, but I believe this change may have reintroduced the conflict with different packages sharing the |
Strictly speaking, this PR removed such a conflict, so that protobuf doesn't need to be the first thing imported under
Those logs indicate that If that's the case, then google-auth can be fixed... but it would also indicate that the Bazel runner may be installing the package incorrectly. |
(The same comment applies to #7908) @dlj-NaN Why was this migrated to native-namespace-packages instead of pkgutil-style-namespace-packages? It seems like doing so would not break as many people as it did (including googleapis/googleapis build, which is why I'm here). Grpc got broken by this: grpc/grpc#25131 (comment) It looks like, at least for the bazel builds, the teams are "fixing" it by adding patches to the protobuf imports, which is not sustainable. I've tried converting the protobuf package to pkgutil-style-namespace-packages instead of a native package and it seems fixing at least the googleapis' problem (there is a good chance that it will make patching unnecessary for other teams as well). I understand that the ultimate solution would be migrating the other google-namespaced packages (like google-auth) but untill that happens there will be a lot of friction and patches, which would be nice to avoid is possible.
Looks like the documentation strictly advises against trying to migrate an existing package |
…kaged as native namespace package) More details about Python namespace packaging here: https://packaging.python.org/guides/packaging-namespace-packages/#native-namespace-packages The protobuf changes, which made this fix necessary are here: protocolbuffers/protobuf#7902 protocolbuffers/protobuf#7908 The tracking bug for this issue (probably not the only one) googleapis/gapic-generator#3334 This is only part of the fix, for the proper fix other google-namespaced python packages must be changed as well.
…kaged as native namespace package) (#753) More details about Python namespace packaging here: https://packaging.python.org/guides/packaging-namespace-packages/#native-namespace-packages The protobuf changes, which made this fix necessary are here: protocolbuffers/protobuf#7902 protocolbuffers/protobuf#7908 The tracking bug for this issue (probably not the only one) googleapis/gapic-generator#3334 This is only part of the fix, for the proper fix other google-namespaced python packages must be migrated to [pkgutil-style-namespace-packages](https://packaging.python.org/guides/packaging-namespace-packages/#pkgutil-style-namespace-packages) and make sure they **do not** have `namespace_packages` in their `setup.py` file (an artifact from the legacy `pkg_resources-style`packages)
@vam-google : per the PR text, there were other packages were broken by the pkgutil-style namespacing (and not using Bazel). That was my overriding concern. Looking at googleapis/gapic-generator-python#753, I don't see any details of the error you saw, so I can really only guess at specifics. It looks like the upshot is that protobuf was, at one point, winning a global-transitive-import-order race, but only by chance, and not by right -- the race depends on the global ordering of imports, transitively. (Python imports are usually notionally topological, but in this case the import machinery is being hooked somewhere, so ordering matters.) Incidentally, between |
In #713 and #1296, the
google
package in protobuf sources was foundto cause conflicts with other Google projects, because it was not
properly configured as a namespace package. The initial fix in
786f80f addressed part of the issue, and #1298 fixed the rest.
However, 786f80f (the initial fix) also made
google.protobuf
andgoogle.protobuf.pyext
into namespace packages. This was not correct:they are both regular, non-namespace, sub-subpackages.
However (still), the follow-up #1298 only nominated
google
as anamespace packages, leaving the namespace registration behavior
for the other packages, without benefit (or need).
This change removes the unnecessary namespace registration, which has
substantial overhead, thus reducing startup time substantially when
using protobufs.
Because this change affects the import internals, quantifying the
overhead requires a full tear-down/start-up of the Python interpreter.
So, to capture the full cost for every run, I measured the time to
launching a fresh Python instance in a subprocess, varying the
imports and code under test. In other words, I used
timeit
tomeasure the time to launch a fresh Python subprocess which actually
performs the imports.
Reference: normal Python startup (i.e., don't import protobuf at all).
Baseline: cost to import
google.protobuf.descriptor
, withextraneous namespace packages.
This change: cost to import
google.protobuf.descriptor
, withoutextraneous namespace packages.